721 research outputs found

    Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis

    Full text link
    The purpose of this study is to investigate how humans interpret musical scores expressively, and then design machines that sing like humans. We consider six factors that have a strong influence on the expression of human singing. The factors are related to the acoustic, phonetic, and musical features of a real singing signal. Given real singing voices recorded following the MIDI scores and lyrics, our analysis module can extract the expression parameters from the real singing signals semi-automatically. The expression parameters are used to control the singing voice synthesis (SVS) system for Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The results of perceptual experiments show that integrating the expression factors into the SVS system yields a notable improvement in perceptual naturalness, clearness, and expressiveness. By one-to-one mapping of the real singing signal and expression controls to the synthesizer, our SVS system can simulate the interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor

    Affective Music Information Retrieval

    Full text link
    Much of the appeal of music lies in its power to convey emotions/moods and to evoke them in listeners. In consequence, the past decade witnessed a growing interest in modeling emotions from musical signals in the music information retrieval (MIR) community. In this article, we present a novel generative approach to music emotion modeling, with a specific focus on the valence-arousal (VA) dimension model of emotion. The presented generative model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the subjectivity of emotion perception by the use of probability distributions. Specifically, it learns from the emotion annotations of multiple subjects a Gaussian mixture model in the VA space with prior constraints on the corresponding acoustic features of the training music pieces. Such a computational framework is technically sound, capable of learning in an online fashion, and thus applicable to a variety of applications, including user-independent (general) and user-dependent (personalized) emotion recognition and emotion-based music retrieval. We report evaluations of the aforementioned applications of AEG on a larger-scale emotion-annotated corpora, AMG1608, to demonstrate the effectiveness of AEG and to showcase how evaluations are conducted for research on emotion-based MIR. Directions of future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio

    An Output-Recurrent-Neural-Network-Based Iterative Learning Control for Unknown Nonlinear Dynamic Plants

    Get PDF
    We present a design method for iterative learning control system by using an output recurrent neural network (ORNN). Two ORNNs are employed to design the learning control structure. The first ORNN, which is called the output recurrent neural controller (ORNC), is used as an iterative learning controller to achieve the learning control objective. To guarantee the convergence of learning error, some information of plant sensitivity is required to design a suitable adaptive law for the ORNC. Hence, a second ORNN, which is called the output recurrent neural identifier (ORNI), is used as an identifier to provide the required information. All the weights of ORNC and ORNI will be tuned during the control iteration and identification process, respectively, in order to achieve a desired learning performance. The adaptive laws for the weights of ORNC and ORNI and the analysis of learning performances are determined via a Lyapunov like analysis. It is shown that the identification error will asymptotically converge to zero and repetitive output tracking error will asymptotically converge to zero except the initial resetting error

    SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System

    Full text link
    Singing voice beat and downbeat tracking posses several applications in automatic music production, analysis and manipulation. Among them, some require real-time processing, such as live performance processing and auto-accompaniment for singing inputs. This task is challenging owing to the non-trivial rhythmic and harmonic patterns in singing signals. For real-time processing, it introduces further constraints such as inaccessibility to future data and the impossibility to correct the previous results that are inconsistent with the latter ones. In this paper, we introduce the first system that tracks the beats and downbeats of singing voices in real-time. Specifically, we propose a novel dynamic particle filtering approach that incorporates offline historical data to correct the online inference by using a variable number of particles. We evaluate the performance on two datasets: GTZAN with the separated vocal tracks, and an in-house dataset with the original vocal stems. Experimental result demonstrates that our proposed approach outperforms the baseline by 3-5%.Comment: Accepted for 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2023

    Music Source Separation with Band-Split RoPE Transformer

    Full text link
    Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used, but the improvement is still limited. In this paper, we propose a novel frequency-domain approach based on a Band-Split RoPE Transformer (called BS-RoFormer). BS-RoFormer replies on a band-split module to project the input complex spectrogram into subband-level representations, and then arranges a stack of hierarchical Transformers to model the inner-band as well as inter-band sequences for multi-band mask estimation. To facilitate training the model for MSS, we propose to use the Rotary Position Embedding (RoPE). The BS-RoFormer system trained on MUSDB18HQ and 500 extra songs ranked the first place in the MSS track of Sound Demixing Challenge (SDX23). Benchmarking a smaller version of BS-RoFormer on MUSDB18HQ, we achieve state-of-the-art result without extra training data, with 9.80 dB of average SDR.Comment: This paper explains the SAMI-ByteDance MSS system submitted to Sound Demixing Challenge (SDX23) Music Separation Trac

    An Improved di/dt-RCD Detection for Short-Circuit Protection of SiC MOSFET

    Get PDF

    An Output-Recurrent-Neural-Network-Based Iterative Learning Control for Unknown Nonlinear Dynamic Plants

    Get PDF
    We present a design method for iterative learning control system by using an output recurrent neural network (ORNN). Two ORNNs are employed to design the learning control structure. The first ORNN, which is called the output recurrent neural controller (ORNC), is used as an iterative learning controller to achieve the learning control objective. To guarantee the convergence of learning error, some information of plant sensitivity is required to design a suitable adaptive law for the ORNC. Hence, a second ORNN, which is called the output recurrent neural identifier (ORNI), is used as an identifier to provide the required information. All the weights of ORNC and ORNI will be tuned during the control iteration and identification process, respectively, in order to achieve a desired learning performance. The adaptive laws for the weights of ORNC and ORNI and the analysis of learning performances are determined via a Lyapunov like analysis. It is shown that the identification error will asymptotically converge to zero and repetitive output tracking error will asymptotically converge to zero except the initial resetting error
    • …
    corecore